- Home
- Search Results
- Page 1 of 1
Search for: All records
-
Total Resources4
- Resource Type
-
0000000004000000
- More
- Availability
-
31
- Author / Contributor
- Filter by Author / Creator
-
-
Day, Hannah (3)
-
Guo, Xin (2)
-
Li, Yang (2)
-
Milton, Kimball A. (2)
-
Cavero-Peláez, Inés (1)
-
Colin, Theotime (1)
-
Fulling, Stephen A. (1)
-
Gaines Day, Hannah R. (1)
-
Johnson, Reed M. (1)
-
Kahn, Yonatan (1)
-
Kennedy, Gerard (1)
-
McMinn-Sauder, Harper B. (1)
-
Meikle, William G. (1)
-
Parashar, Prachi (1)
-
Quinlan, Gabriela (1)
-
Roberts, Daniel A (1)
-
Smart, Autumn (1)
-
Sponsler, Douglas B. (1)
-
#Tyler Phillips, Kenneth E. (0)
-
#Willis, Ciara (0)
-
- Filter by Editor
-
-
null (1)
-
& Spizer, S. M. (0)
-
& . Spizer, S. (0)
-
& Ahn, J. (0)
-
& Bateiha, S. (0)
-
& Bosch, N. (0)
-
& Brennan K. (0)
-
& Brennan, K. (0)
-
& Chen, B. (0)
-
& Chen, Bodong (0)
-
& Drown, S. (0)
-
& Ferretti, F. (0)
-
& Higgins, A. (0)
-
& J. Peters (0)
-
& Kali, Y. (0)
-
& Ruiz-Arias, P.M. (0)
-
& S. Spitzer (0)
-
& Sahin. I. (0)
-
& Spitzer, S. (0)
-
& Spitzer, S.M. (0)
-
-
Have feedback or suggestions for a way to improve these results?
!
Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher.
Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?
Some links on this page may take you to non-federal websites. Their policies may differ from this site.
-
Abstract Fully-connected deep neural networks with weights initialized from independent Gaussian distributions can be tuned to criticality, which prevents the exponential growth or decay of signals propagating through the network. However, such networks still exhibit fluctuations that grow linearly with the depth of the network, which may impair the training of networks with width comparable to depth. We show analytically that rectangular networks with tanh activations and weights initialized from the ensemble of orthogonal matrices have corresponding preactivation fluctuations which are independent of depth, to leading order in inverse width. Moreover, we demonstrate numerically that, at initialization, all correlators involving the neural tangent kernel (NTK) and its descendants at leading order in inverse width—which govern the evolution of observables during training—saturate at a depth of , rather than growing without bound as in the case of Gaussian initializations. We speculate that this structure preserves finite-width feature learning while reducing overall noise, thus improving both generalization and training speed in deep networks with depth comparable to width. We provide some experimental justification by relating empirical measurements of the NTK to the superior performance of deep non-linear orthogonal networks trained under full-batch gradient descent on the MNIST and CIFAR-10 classification tasks.more » « lessFree, publicly-accessible full text available August 7, 2026
-
McMinn-Sauder, Harper B.; Colin, Theotime; Gaines Day, Hannah R.; Quinlan, Gabriela; Smart, Autumn; Meikle, William G.; Johnson, Reed M.; Sponsler, Douglas B. (, Apidologie)
-
Milton, Kimball A.; Day, Hannah; Li, Yang; Guo, Xin; Kennedy, Gerard (, Physical Review Research)null (Ed.)
-
Parashar, Prachi; Milton, Kimball A.; Li, Yang; Day, Hannah; Guo, Xin; Fulling, Stephen A.; Cavero-Peláez, Inés (, Physical Review D)
An official website of the United States government
